An Efficient Cell-Based Clustering Method for Handling Large, High-Dimensional Data

نویسنده

Jae-Woo Chang

چکیده

Data mining applications have recently required a large amount of high-dimensional data. However, most clustering methods for the data miming applications do not work efficiently for dealing with large, high-dimensional data because of the so-called ‘curse of dimensionality’ [1] and the limitation of available memory. In this paper, we propose an efficient cell-based clustering method for handling a large of amount of high-dimensional data. Our clustering method provides an efficient cell creation algorithm using a space-partitioning technique and a cell insertion algorithm to construct clusters as cells with more density than a given threshold. To achieve good retrieval performance on clusters, we also propose a new filtering-based index structure using an approximation technique. In addition, we compare the performance of our cellbased clustering method with the CLIQUE method in terms of cluster construction time, precision, and retrieval time. The experimental results show that our clustering method achieves better performance on cluster construction time and retrieval time. Finally, our clustering method shows good performance on system efficiency which is a measure to combine both precision and retrieval time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

An Efficient Cell-Based Clustering Method for Handling Large, High-Dimensional Data

نویسنده

چکیده

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

عنوان ژورنال:

اشتراک گذاری